Project Statement

This project focus on RFM model and apply this model to customer segmentation. The RFM model is based on three quantitative factors:

Data Exploration

Notice that @timestamp column is object data type. We need change it to time data type but only keep the date part.

Also, Notice that uid is int data type. We need change it to category data type.

From above histogram, we can see that most customers spend $0-\\$400 on games.

Feature Engineering

Data Visualization

From above subplots, we can see that R_score distribute relatively even. But F_socre and M_score distribute uneven and most of customers fall in $0-\\$19.9k group and 0-19 purchase frequency group.

Customer Segmentation

Even though we can label a customer 'high value' or 'low value' based on mean value. Here I am gona using machine learning algorithm K-Means to segment customers.

Best k

elbow method

I am using distortion score in this project.

elbow point for R_score

elbow point for F_score

elbow point for M_score

elbow point for R_score, F_score, M_score

From elbow curve, we can see that the best k for R,F,M value is around 2~3. This time I am using Python package, kneed, to identify the elbow point programmatically.

We choose 3 for the best k value. So we can segment customers to 3 clusters (High, Middle, Low).

Build and Train the model

RFM Level

From the above bar chart, we can see that the number of low value customers is much higher than middle value customers and high value customers.

Conclusions

By applying machine learning algorithm KMeans in RFM model to analyze customer segmentation, we have a result that most of our low value customers (99%) fall in \$0-\\$19.9k group and 0-19 purchase frequency group. Differentiated and personalized marketing strategies can be applied to different segments of customers.